//$$ newmatb.txt                                   Design


Copyright (C) 1991,2,3,4: R B Davies

Please treat this as an academic publication. You can use the ideas but
please acknowledge in any publication.

See also my publication in the proceedings of the second (1994)
object-oriented numerics conference published by Rogue-wave Software.



                      Notes on the design of the package
                      ==================================

Contents
========

What this is package for
What size of matrices?
Allow matrix expressions?
Which matrix types?
What element types?
Naming convention
Row and Column index ranges
Structure of matrix objects
Data storage - one block or several
Data storage - by row or by column or other
Storage of symmetric matrices
Element access - method and checking
Use iterators?
Memory management - reference counting or status variable?
Evaluation of expressions - use two stage method?
How to overcome an explosion in number of operations
Destruction of temporaries
A calculus of matrix types
Error handling
Sparse matrices
Complex matrices


I describe some of the ideas behind this package, some of the decisions
that I needed to make and give some details about the way it works. You
don't need to read this file in order to use the package.

I don't think it is obvious what is the best way of going about
structuring a matrix package. And I don't think you can figure this
out with "thought" experiments. Different people have to try out
different approaches. And someone else may have to figure out which is
best. Or, more likely, the ultimate packages will lift some ideas from
each of a variety of trial packages. So, I don't claim my package is an
"ultimate" package, but simply a trial of a number of ideas.

But I do hope it will meet the immediate requirements of some people who
need to carry out matrix calculations.


What this is package for
------------------------

The package is used for the manipulation of matrices, including the
standard operations such as multiplication as understood by numerical
analysts, engineers and mathematicians. A matrix is a two dimensional
array of numbers. However, very special operations such as matrix
multiplication are defined specifically for matrices. This means that a
"matrix" package tends to be different from a general "array" package.

I see a matrix package as providing the following

1.   Evaluation of matrix expressions in a form familiar to
     scientists and engineers. For example  A = B * (C + D.t());
2.   Access to the elements of a matrix;
3.   Access to submatrices;
4.   Common elementary matrix functions such as determinant and trace;
5.   A platform for developing advanced matrix functions such as eigen-
     value analysis;
6.   Good efficiency and storage management;
7.   Graceful exit from errors.

It may also provide

8.   A variety of types of elements (eg real and complex);
9.   A variety of types of matrices (eg rectangular and symmetric).

In contrast an array package should provide

1'.  Arrays can be copied with a single instruction; may have some
     arithmetic operations, say + - and scalar + - * /, it may provide
     matrix multiplication as a function;
2'.  High speed access to elements directly and with iterators;
3'.  Access to subarrays and rows (and columns?);
6'.  Good efficiency and storage management;
7'.  Graceful exit from errors;
8'.  A variety of types of elements (eg real and complex);
9'.  One, two and three dimensional arrays, at least, with starting
     points of the indices set by user; may have arrays with ragged
     rows.

It may be possible to amalgamate these two sets of requirements to some
extent. However my package is definitely oriented towards the first set.

Even within the bounds set by the first set of requirements there is a
substantial opportunity for variation between what different matrix
packages might provide.

It is not possible to build a matrix package that will meet everyone's
requirements. In many cases if you put in one facility, you impose
overheads on everyone using the package. This both is storage required
for the program and in efficiency. Likewise a package that is optimised
towards handling large matrices is likely to become less efficient for
very small matrices where the administration time for the matrix may
become significant compared with the time to carry out the operations.

It is better to provide a variety of packages (hopefully compatible) so
that most users can find one that meets their requirements. This package
is intended to be one of these packages; but not all of them.

Since my background is in statistical methods, this package is oriented
towards the kinds things you need for statistical analyses.


What size of matrices?
----------------------

A matrix package may target small matrices (say 3 x 3), or medium sized
matrices, or very large matrices. A package targeting very small
matrices will seek to minimise administration. A package for medium
sized or very large matrices can spend more time on administration in
order to conserve space or optimise the evaluation of expressions. A
package for very large matrices will need to pay special attention to
storage and numerical properties.

This package is designed for medium sized matrices. This means it is
worth introducing some optimisations, but I don't have to worry about
the 64K limit that some compilers impose.


Allow matrix expressions?
-------------------------

I want to be able to write matrix expressions the way I would on paper.
So if I want to multiply two matrices and then add the transpose of a
third one I can write something like

   X = A * B + C.t();

I want this expression to be evaluated with close to the same efficiency
as a hand-coded version. This is not so much of a problem with
expressions including a multiply since the multiply will dominate the
time. However, it is not so easy to achieve with expressions with just +
and - .

A second requirement is that temporary matrices generated during the
evaluation of an expression are destroyed as quickly as possible.

A desirable feature is that a certain amount of "intelligence" be
displayed in the evaluation of an expression. For example, in the
expression

   X = A.i() * B;

where i() denotes inverse, it would be desirable if the inverse wasn't
explicitly calculated.


Which matrix types?
-------------------

As well as the usual rectangular matrices, matrices occuring repeatedly
in numerical calculations are upper and lower triangular matrices,
symmetric matrices and diagonal matrices. This is particularly the case
in calculations involving least squares and eigenvalue calculations. So
as a first stage these were the types I decided to include.

It is also necessary to have types row vector and column vector. In a
"matrix" package, in contrast to an "array" package, it is necessary to
have both these types since they behave differently in matrix
expressions. The vector types can be derived for the rectangular matrix
type, so having them does not greatly increase the complexity of the
package.

The problem with having several matrix types is the number of versions
of the binary operators one needs. If one has 5 distinct matrix types
then a simple package will need 25 versions of each of the binary
operators. In fact, we can evade this problem, but at the cost of some
complexity.


What element types?
-------------------

Ideally we would allow element types double, float, complex and int, at
least. It would be reasonably easy, using templates or equivalent, to
provide a package which could handle a variety of element types.
However, as soon as one starts implementing the binary operators between
matrices with different element types, again one gets an explosion in
the number of operations one needs to consider. Hence I decided to
implement only one element type. But the user can decide whether this is
float or double. The package assumes elements are of type Real. The user
typedefs Real to float or double.

In retrospect, it would not be too hard to include matrices with complex
matrix type.

It might also be worth including symmetric and triangular matrices with
extra precision elements (double or long double) to be used for storage
only and with a minimum of operations defined. These would be used for
accumulating the results of sums of squares and product matrices or
multistage Householder triangularisations.


Naming convention
-----------------

How are classes and public member functions to be named? As a general
rule I have spelt identifiers out in full with individual words being
capitalised. For example "UpperTriangularMatrix". If you don't like this
you can #define or typedef shorter names. This convention means you can
select an abbreviation scheme that makes sense to you.

The convention causes problems for Glockenspiel C++ on a PC feeding into
Microsoft C. The names Glockenspiel generates exceed the the 32
characters recognised by Microsoft C and ambiguities result. So it is
necessary to #define shorter names.

Exceptions to the general rule are the functions for transpose and
inverse. To make matrix expressions more like the corresponding
mathematical formulae, I have used the single letter abbreviations, t()
and i() .


Row and Column index ranges
---------------------------

In mathematical work matrix subscripts usually start at one. In C, array
subscripts start at zero. In Fortran, they start at one. Possibilities
for this package were to make them start at 0 or 1 or be arbitrary.
Alternatively one could specify an "index set" for indexing the rows and
columns of a matrix. One would be able to add or multiply matrices only
if the appropriate row and column index sets were identical.

In fact, I adopted the simpler convention of making the rows and columns
of a matrix be indexed by an integer starting at one, following the
traditional convention. In an earlier version of the package I had them
starting at zero, but even I was getting mixed up when trying to use
this earlier package. So I reverted to the more usual notation.


Structure of matrix objects
---------------------------

Each matrix object contains the basic information such as the number of
rows and columns and a status variable plus a pointer to the data
array which is on the heap.


Data storage - one block or several
-----------------------------------

In this package the elements of the matrix are stored as a single array.
Alternatives would have been to store each row as a separate array or a
set of adjacent rows as a separate array. The present solution
simplifies the program but limits the size of matrices in systems that
have a 64k byte (or other) limit on the size of arrays. The large arrays
may also cause problems for memory management in smaller machines.


Data storage - by row or by column or other
-------------------------------------------

In Fortran two dimensional arrays are stored by column. In most other
systems they are stored by row. I have followed this later convention.
This makes it easier to interface with other packages written in C but
harder to interface with those written in Fortran.

This may have been a wrong decision. Most work on the efficient
manipulation of large matrices is being done in Fortran. It would have
been easier to use this work if I had adopted the Fortan convention.

An alternative would be to store the elements by mid-sized rectangular
blocks. This might impose less strain on memory management when one
needs to access both rows and columns.


Storage of symmetric matrices
-----------------------------

Symmetric matrices are stored as lower triangular matrices. The decision
was pretty arbitrary, but it does slightly simplify the Cholesky
decomposition program.


Element access - method and checking
------------------------------------

We want to be able to use the notation A(i,j) to specify the (i,j)-th
element of a matrix. This is the way mathematicians expect to address
the elements of matrices. I consider the notation A[i][j] totally alien.
However I may include it in a later version to help people converting
from C. There are two ways of working out the address of A(i,j). One is
using a "dope" vector which contains the first address of each row.
Alternatively you can calculate the address using the formula
appropriate for the structure of A. I use this second approach. It is
probably slower, but saves worrying about an extra bit of storage. The
other question is whether to check for i and j being in range. I do
carry out this check following years of experience with both systems
that do and systems that don't do this check.

I would hope that the routines I supply with this package will reduce
your need to access elements of matrices so speed of access is not a
high priority.


Use iterators?
--------------

Iterators are an alternative way of providing fast access to the
elements of an array or matrix when they are to be accessed
sequentially. They need to be customised for each type of matrix. I have
not implemented iterators in this package, although some iterator like
functions are used for some row and column functions.


Memory management - reference counting or status variable?
----------------------------------------------------------

Consider the instruction

   X = A + B + C;

To evaluate this a simple program will add A to B putting the total in a
temporary T1. Then it will add T1 to C creating another temporary T2
which will be copied into X. T1 and T2 will sit around till the end of
the block. It would be faster if the program recognised that T1 was
temporary and stored the sum of T1 and C back into T1 instead of
creating T2 and then avoided the final copy by just assigning the
contents of T1 to X rather than copying. In this case there will be no
temporaries requiring deletion. (More precisely there will be a header
to be deleted but no contents).

For an instruction like

   X = (A * B) + (C * D);

we can't easily avoid one temporary being left over, so we would like
this temporary deleted as quickly as possible.

I provide the functionality for doing this by attaching a status
variable to each matrix. This indicates if the matrix is temporary so
that its memory is available for recycling or deleting. Any matrix
operation checks the status variables of the matrices it is working with
and recycles or deletes any temporary memory.

An alternative or additional approach would be to use delayed copying.
If a program requests a matrix to be copied, the copy is delayed until
an instruction is executed which modifies the memory of either the
original matrix or the copy. This saves the unnecessary copying in the
previous examples. However, it does not provide the additional
functionality of my approach.

It would be possible to have both delayed copy and tagging temporaries,
but this seemed an unnecessary complexity. In particular delayed copy
mechanisms seem to require two calls to the heap manager, using extra
time and making building a memory compacting mechanism more difficult.


Evaluation of expressions - use two stage method?
-------------------------------------------------

Consider the instruction

   X = B - X;

The simple program will subtract X from B, store the result in a
temporary T1 and copy T1 into X. It would be faster if the program
recognised that the result could be stored directly into X. This would
happen automatically if the program could look at the instruction first
and mark X as temporary.

C programmers would expect to avoid the same problem with

   X = X - B;

by using an operator -= (which I haven't provided, yet)

   X -= B;

However this is an unnatural notation for non C users and it is much
nicer to write

   X = X - B;

and know that the program will carry out the simplification.

Another example where this intelligent analysis of an instruction is
helpful is in

   X = A.i() * B;

where i() denotes inverse. Numerical analysts know it is inefficient to
evaluate this expression by carrying out the inverse operation and then
the multiply. Yet it is a convenient way of writing the instruction. It
would be helpful if the program recognised this expression and carried
out the more appropriate approach.

To carry out this "intelligent" analysis of an instruction  matrix
expressions are evaluated in two stages. In the the first stage a tree
representation of the expression is formed.

For example (A+B)*C is represented by a tree

                    *
                   / \
                  +   C
                 / \
                A   B

Rather than adding A and B the + operator yields an object of a class
"AddedMatrix" which is just a pair of pointers to A and B. Then the *
operator yields a "MultipliedMatrix" which is a pair of pointers to the
"AddedMatrix" and C. The tree is examined for any simplifications and
then evaluated recursively.

Further possibilities not yet included are to recognise A.t()*A and
A.t()+A as symmetric or to improve the efficiency of evaluation of
expressions like A+B+C, A*B*C, A*B.t()  [t() denotes transpose].

One of the disadvantages of the two-stage approach is that the types of
matrix expressions are determined at run-time. So the compiler will not
detect errors of the type

   Matrix M; DiagonalMatrix D; ....; D = M;

We don't allow conversions using = when information would be lost. Such
errors will be detected when the statement is executed.


How to overcome an explosion in number of operations
----------------------------------------------------

The package attempts to solve the problem of the large number of
versions of the binary operations required when one has a variety of
types. With n types of matrices the binary operations will each require
n-squared separate algorithms. Some reduction in the number may be
possible by carrying out conversions. However the situation rapidly
becomes impossible with more than 4 or 5 types.

Doug Lea told me that it was possible to avoid this problem. I don't
know what his solution is. Here's mine.

Each matrix type includes routines for extracting individual rows or
columns. I assume a row or column consists of a sequence of zeros, a
sequence of stored values and then another sequence of zeros. Only a
single algorithm is then required for each binary operation. The rows
can be located very quickly since most of the matrices are stored row by
row. Columns must be copied and so the access is somewhat slower. As far
as possible my algorithms access the matrices by row.

An alternative approach of using iterators will be slower since the
iterators will involve a virtual function access at each step.

There is another approach. Each of the matrix types defined in this
package can be set up so both rows and columns have their elements at
equal intervals provided we are prepared to store the rows and columns
in up to three chunks. With such an approach one could write a single
"generic" algorithm for each of multiply and add. This would be a
reasonable alternative to my approach.

I provide several algorithms for operations like + . If one is adding
two matrices of the same type then there is no need to access the
individual rows or columns and a faster general algorithm is
appropriate.

Generally the method works well. However symmetric matrices are not
always handled very efficiently (yet) since complete rows are not stored
explicitly.

The original version of the package did not use this access by row or
column method and provided the multitude of algorithms for the
combination of different matrix types. The code file length turned out
to be just a little longer than the present one when providing the same
facilities with 5 distinct types of matrices. It would have been very
difficult to increase the number of matrix types in the original
version. Apparently 4 to 5 types is about the break even point for
switching to the approach adopted in the present package.

However it must also be admitted that there is a substantial overhead in
the approach adopted in the present package for small matrices. The test
program developed for the original version of the package takes 30 to
50% longer to run with the current version. This is for matrices in the
range 6x6 to 10x10.

To try to improve the situation a little I do provide an ordinary matrix
multiplication routine for the case when all the matrices involved are
rectangular.


Destruction of temporaries
--------------------------

Versions before version 5 of newmat did not work correctly with Gnu C++.
This was because the tree structure used to represent a matrix
expression was set up on the stack. This was fine for AT&T, Borland and
Zortech C++. However Gnu C++ destroys temporary structures as soon as
the function that accesses them finishes. The other compilers wait until
the end of the current expression or current block. To overcome this
problem, there is now an option to store the temporaries forming the
tree structure on the heap (created with new) and to delete them
explicitly. Activate the definition of TEMPS_DESTROYED_QUICKLY to set
this option. In fact, I suggest this be the default option as, with it,
the package uses less storage and runs faster.

There still exist situations Gnu C++ will go wrong. These include
statements like

A = X * Matrix(P * Q); Real r = (A*B)(3,4);

Neither of these kinds of statements will occur often in practice.


A calculus of matrix types
--------------------------

The program needs to be able to work out the class of the result of a
matrix expression. This is to check that a conversion is legal or to
determine the class of a temporary. To assist with this, a class
MatrixType is defined. Operators +, -, *, >= are defined to calculate
the types of the results of expressions or to check that conversions are
legal.


Error handling
--------------

The package now does have a moderately graceful exit from errors.
Originally I thought I would wait until exceptions became available in
C++. Trying to set up an error handling scheme that did not involve an
exception-like facility was just too complicated. Version 5 of this
package included the beginnings of such an approach. The approach in the
present package, attempting to implement C++ exceptions, is not
completely satisfactory, but seems a good interim solution.

The exception mechanism cannot clean-up objects explicitly created by
new. This must be explicitly carried out by the package writer or the
package user. I have not yet done this with the present package so
occasionally a little garbage may be left behind after an exception. I
don't think this is a big problem, but it is one that needs fixing.

I identify five categories of errors:

   Programming error - eg illegal conversion of a matrix, subscript out
   of bounds, matrix dimensions don't match;

   Illegal data error - eg Cholesky of a non-positive definite matrix;

   Out of space error - "new" returns a null pointer;

   Convergence failure - an iterative operation fails to converge;

   Internal error - failure of an internal check.

Some of these have subcategories, which are nicely represented by
derived exception classes. But I don't think it is a good idea for
package users to refer to these as they may change in the next release
of the package.


Sparse matrices
---------------

The package does not yet support sparse matrices.

For sparse matrices there is going to be some kind of structure vector.
It is going to have to be calculated for the results of expressions in
much the same way that types are calculated. In addition, a whole new
set of row and column operations would have to be written.

Sparse matrices are important for people solving large sets of
differential equations as well as being important for statistical and
operational research applications. I think comprehensive matrix package
needs to implement sparse matrices.


Complex matrices
----------------

The package does not yet support matrices with complex elements. There
are at least two approaches to including these. One is to have matrices
with complex elements. This probably means making new versions of the
basic row and column operations for Real*Complex, Complex*Complex,
Complex*Real and similarly for + and -. This would be OK, except that I
am also going to have to also do this for sparse and when you put these
together, the whole thing will get out of hand.

The alternative is to represent a Complex matrix by a pair of Real
matrices. One probably needs another level of decoding expressions but I
think it might still be simpler than the first approach. There is going
to be a problem with accessing elements and the final package may be a
little less efficient that one using the previous representation.

Complex matrices are used extensively by electrical engineers and really
should be fully supported in a comprehensive package.